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Abstract 

The increasing popularity of streaming video is a cause for 
concern for the stability of the Internet because most streaming 
video content is currently delivered via UDP, without any end- 
to-end congestion control. Since the Internet relies on end sys- 
tems implementing transmit rate regulation, there has recently 
been significant interest in congestion control mechanisms that 
are both fair to TCP and effective in delivering real-time streams. 

In this paper we design and implement a protocol that at- 
teaq)ts to maximize the quality of real-time MPEG-4 video 
streams while simultaneously providing basic end-to-end con- 
gestion control. While several adaptive protocols have been pro- 
posed in die literature P3, 31], the unique feature of pur proto- 
col, the Video T^sport Protocol (VTP), is the use of receiver 
side bandwidth estinnatioo. Such estimation is transmitted to the 
sender and enables it to adapt to network conditions by altering 
its sending rate and the bitrate of the transRiitted video streanL 
We deploy our protocol in a real network testbed and extensively 
study its behavior under varying link speeds and background 
traffic profiles using the FreeBSD Dununynet link emulator [26]. 
Our results show that VTP delivers consistent quality video in 
moderately congested networks and fairly shares bandwidth with 
TCP in all but a few extreme cases. We also describe some of 
the challenges in implementing an adaptive video streaming pro- 
tocol. 



1 hitroduction 

As the Internet continues to grow and mature, transmission 
of multimedia content is expected to increase and compose a 
large portion of the overall data traffic. Film and television dis- 
tribution, digitized lectures, and distributed interactive gaming 
applications have only begun to be realized in today's Internet, 
but are rapidly gaining popularity. Audio and video streaming 
capabilities will play an ever-increasing role in the multimedia- 
rich Internet of the near ftiture. Real-time streaming has wide 
applicability beyond the public Internet as well. In military and 
commercial wireless domains, virtual private networks, and cor- 
porate intra-nets, audio and video is becoming a conunonplace 
supplement to text and still image graphics. 

CunrenUy, conunercial programs such as RealPiayer [22] and 
Windows Media Player [19] provide the predominant amount of 



the streamed media in the Internet The quality of the content 
delivered by these programs varies, but they are generally asso- 
ciated with low resolution, small frame size video. One reason 
these contemporary streaming platforms exhibit limited quality 
streaming is their inability to dynamically adapt to traf^c condi- 
tions in the network during a streaming session. Although the 
aforementioned applications claim to be adaptive, there is no 
conclusive evidence as to what degree of adaptivity they employ 
as they are proprietary, closed software [23]. Their video streams 
are usually delivered via UDP with no transport layer conges- 
tion control. A laige-scale increase in the amount of streanodng. 
audio/video traffic in the Internet over a firamewoik devoid of 
oidrto-end congestion control will not scale, and could poten- 
tially lead to congestion coll^se. 

UDP is the transport protocol of choice for video streao^ 
ing platforms mainly because the fuDy reliable and strict in- 
'order delivery semantics TCP do not suit the real-time nature 
of video transmission. Video streams are loss wlerant and delay 
sensitive. Retransmissions by TCP to ensure reliability intro- 
duce latency in the delivery of data to the application, which in 
turn leads to degradation of video image quality. Additionally, 
the steady state behavior of TCP involves the repeated halving 
and growth of its congestion window, following the well known 
Additive Increase/Multiplicative Decrease (AIMD) algorithm. 
Hence, the throughput observed by a TCP receiver oscillates 
under normal conditions. This presents another difficulty since 
video is usually streamed at a constant rate (VTP streams are ac- 
tually piecewtse-constant). In order to provide the best quality 
video with minimal buffering, a video stream receiver requires 
relatively stable and predictable throughput 

Our protocol, the Video Transport Protocol (VTP), is de- 
signed with the primary goal of ad^ting an outgoing video 
stream to the characteristics of the netwoilc path between sender 
and receiver. If It determines there is congestion, the VTP sender 
will reduce its sending rate and the video encotfing rate to a level 
the network can accommodate. This enables VTP to deliver a 
larger portion of the overall video stream and to achieve inter- 
protocol fairness with competing TCP traffic. A secondary goal 
of VTP is the minimal use of network and end system resources. 
We make several trade-offe to limit processing overhead and 
buffering requirements in the receiver. In general, VTP follows 
a conservative design philosophy by sparingly using bandwidth 
and memory during the streaming session. 

In essence, the VTP sender asks the receiver the question: are 
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you receiving at least as fast as I am sending? If so, the sender 
increases its rate by a small amount to probe the network for 
unused bandwidth. If not, the sender immediately reduces its 
rate by an amount based on the receiver's bandwidth, the current 
sending rate and video bitrate. 

An important aspect of VTP is that it is completely end-to- 
end. VTP does not rely on QoS functionality in routers, random 
early drop (RED), other active queue management ( AQM) or ex- 
plicit congestion notification (ECN). It could potentially benefit 
from such network level fociiities, but in this paper we focus 
only on the case of real-time streaming in a strictly best effort 
network. Possible interactions between VTP and QoS routers, 
AQM or ECN is an area of future work. 

VTP is implemented entirely in user space and designed 
around open video compression standards and codecs for which 
the source code is freely available. The functionality is split 
between two distinct components, each embodied in a separate 
software library with its own API. The components can be used 
together or separately, and are designed to be extensible. VTP 
sends packets using UDP, adding congestion control at the ap- 
plication layer. 

This paper discusses related work in the next section and 
presents an overview of the MPEG-4 compression standard in 
Section 3. The VTP design is described in Section 4. Section 
S covers the VTP implementation, experiments and results. The 
conclusion and a brief discussion of fiiture work follow. 

2 Related Work 

Recent research approaches to address the lack of a suitable 
end-to-end service model for mtUtimedia streaming generally 
£all into two cat^ories: 1) modifications or enhancements to 
AIMD congestion control to better accommodate streaming ap- 
plications, or 2) model-based flow control based primarily on 
the results of [21]. We give several examples of each technique 
before presenting the mi^vation and design of VTP. 

The Rate Adaptation Protocol (RAP) [23] is a rate based 
AIMD protocol intended for transmitting real-time video. The 
• RAP sender uses feedback r^arding congestion conditions from 
the sender to make decisions about its sending rate and the trans- 
mitted video quality. The RAP algorithm does not result in fair- 
ness with TCP in many cases, but router support in the form of 
RED can improve RAP's inter-protocol behavior to some extent. 

A major difference between VTP and RAP is the degree to 
which th^ comply to AIMD. While RAP is a full AIMD proto- 
col, VTP performs additive increase but it does not decrease its 
sending rate multiplicatively. Rather, it adjusts its sending rate 
to the rate perceived by the receiver. RAP and VTP also dif- 
fer in the type of video encoding they stream. RAP is based on 
layered video encoding where the sender can decide how many 
layers can be sent at any gWen time. On the other hand, VTP 
assumes a discrete encoding scheme, where the sender chooses 
one of several pre-encoded streams and exclusively sends from 
that stream until It decides to change the video quality. Video 
compression is described in further detail in the next section. 

In the spirit of RAP, N. Feamster proposes SR-RTP [9, 10], 
a backwaid compatible extension to the Real Time Protocol 
(RTP). SR-RTP uses a quality adaptation mechanism similar to 



RAP. but **binomial" congestion control reduces die congestion 
window size proportional to the square root of its value rather 
than halving it in response to loss. This is shown to assuage os- 
cillations in the sending rate and produce smoother throughput 
Binomial algorithms also display a reasonable amount of TCP 
fairness [S]. 

The main benefits of SR-RTP come from two unique features: 
selective reuransmission of certain video packets, and decoder 
post-processing to conceal errors due to packet loss. However, 
the effectiveness of selective retransmission depends strongly on 
the round trip time (RTT) between sender and receiver. Further, 
in [10], the receiver post-processing is performed offline for ease 
of analysis. It is not clear such recovery techniques are viable in 
real time or with limited processing resources. 

The Stream Control Transmission Protocol (SCTP) [27] is a 
recently proposed protocol with many novel features designed 
to accommodate real-time streaming. SCTP supports muld- 
streaming, where a sender is able to multiplex several outgoing 
streams into one connection. This can potentially be very advan- 
tageous for compressed video formats since packets belonging to 
different parts of the video stream can be treated differently with 
respect to retransmission and order of delivery. The congestion 
control mechanism in SCTP Is identical to TCP, where die con- 
gestion window is reduced by half in the event of padcet loss. 
Lite TCP, SCTP employs slow start to initially seek out avail- 
able bandwidth and congestion avoidance to adapt to changing 
path conditions. This results in perfect fairness with TCP, but 
leads to high variability in throughput at the receive*. An inves- 
tigation of die applicability of SCTP to MPEG-4 streaming is the 
subject of [4]. 

The work of J. Padhye. et. al. [21] has led to TCP-Friendly 
Rate Control (TFRC) [13]. TFRC is not itself a protocol, but an 
algorithm for maintaining the sending rate at the level of a TCP 
flow under the same conditions. The TFRC sender adjusts its 
rate according to an equation that specifies throughput in terms 
of packet size, loss event rate, RTT, and the retransmission timer 
value. TFRC is meant to serve as a congestion control frame- 
work for any applications that do not require the full reliability 
of TCP and would benefit from low variation in sending rate. 

Application domains appropriate for TFRC include nmUime- 
dla streaming, interactive distributed games, Internet telephony, 
and video conferencing. Several authors have applied the TFRC 
model to video streaming. In [28], a new error-resilient video 
compression method is developed which relies on simplified 
derivation of die TCP throughput equation. The relationship be- 
tween the compression level and the congestion control model 
is examined. Ths Multimedia Streaming TCP-Friendly Protocol 
(MSTFP) is part of a comprehensive resource allocation strategy 
proposed in [31] which uses a TFRC model to adapt streaming 
MPEG-4 video. 

Ostensibly, any rate adjustment scheme derived from TCP 
would suffer the same limitations of TCP itself.* TCP's behav- 
iors of poor link utilization in high-loss environments and un- 
fairness against flows with large RTTs have been documented 
repeatedly (see, for example, [2]). These problems resurface in 
TCP-inspired stteaming protocols. 

^The TCP throughput equaUon In TFRC is derived for TCP New 
Reno in paiticulor. 
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Although VTP decreases its sending rate in response to packet 
loss, the decrease decision, as will be shown later, does not as- 
sume that all packet loss is a result of overflowed router buffers. 
At the same time, the amount of decrease is sufficient to restrict 
the sending rate to widiin its fair share of the network bandwidth. 
In this paper we argue that it is possible to build a stable and 
scalable network protocol that is not subject to the limitations of 
TCP. VTP borrows the idea of additive increase from AIMD, but 
it uses a rate estimation based decrease instead of a multiplica- 
tive decrease. 



3 MPEG-4 Background 

The MPEG-4 video compression specification [14, 20] has 
been developed as an open standard to encourage interoperabil- 
ity and widespread use. MPEG-4 has enjoyed wide acceptance 
in the researdi conununity as well as in commercial develop- 
ment owing to its high bitrate scalability and compression ef- 
ficiency. Packetizatidn maikers in the video bitstieam are an- 
other feature which make MPEG-4 especially attractive for net- 
work video transmission. MPEG-4 is' a natural choice for VTP 
since abtmdant documentation exists and numerous open source 
codecs are available. Like other MPEG video compression tech- 
niques, MPEG-4 takes advantage of spatial and temporal redun- 
dancy in individual frames of video to improve coding efficiency. 
A unique capability of MPEG-4 is support for object-based en- 
coding, where each scene is decomposed Into separate video ob- 
jects (VOs). A typical example of the use of object based encod- 
ing is a news broadcast, where the news person is encoded as a 
separate foreground VO while the background images compose 
another object. VO motion is achieved by a progression of video 
objea planes (VOPs). 
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IHgure 1: Group of Visual Object Planes (GOV) in MPEG-4. 



There are three different types of VOPs in the MPEG-4 for- 
mat: (1) Intra-coded VOPs (I- VOPs) that are encoded indepen- 
dently and can be considered "Icey" VOPs; (2) Predicted VOPs 
(P-VOPs) that depend on preceding I- or P-VOPs and cont^n 
predicted motion data and information about the error in the 
predicted values; and (3) Bi-directionally predicted VOPs (B- 
VOPs) that depend on both previous and next VOPs. Figure 1 
shows a sequence of MPEG-4 VOPs, known as a Group of Video 
Object Planes (GOV), with the dependencies represented above 



each plane. If a VOP upon which otiier VOPs depend is dam- 
aged during network transmission, decoding errors will manifest 
in the damaged VOP as well as all its dependent VOPs, a phe- 
nORienon known as propagation of errors. RFC 3016^ .describes 
a structured packetization scheme that improves eiror resiliency, 
making error concealment and error recovery more effective to 
counteract error propagation. 
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Figure 2: Macroblocks and slices in MPEG-4. 



The fundamental processing unit in MPEG-4 is a I6x 16 block 
of pixels called a macroblock. IPigure 2 shows a typical VOP 
composed of rows of macroblocks called slices. Macroblocks 
from P-, and B-VOPs contain different kinds of data that re- 
flect the particular dependency relationships of the VOP. A dis- 
crete cosine transform (OCT) is applied to each macroblock, and 
the resulting 16x16 matrix is then quantized. The range of the 
quantization parameters (QPs) is normally from 1 to 31. with 
higher values indicating more compression and lower quality. 
Ultimately, the bitrate of an MPEG-4 video stream is governed 
by die quantization scale of each OCT unformed macroblock. 

Q. Zhang, et. al. [31] exploit this object based encodii^ struc- 
ture by using network feedback to choose different quantizers for 
each VOP in real time. Foreground (more important) and back- 
ground (less unpoitant) VOPs are weighted unequally, with QP 
values selected so chat the quality of the background VOs is sac- 
rificed first in times of congestion. The ranges of all quantizer 
values are such that the sum of bitrates of all the VOP streams 
equals the target bitrate of the whole video stream. 

In contrast, VTP achieves adaptivity through a less complex 
approach with considerably looser semantics and lighter pro- 
cessing requirements. VTP is founded on the technique of dis- 
crete video encoding, where each video level is independent of 
the others. Each frame in the discrete encoded stream consists 
of only one rectangular VOP of fixed size,^ which implies a one 
to one correspondence between VOPs and frames. In this sense, 
the MPEG-4 codec in VTP performs like a conventional frame- 
based encoder. In the remainder of this paper we use the terms 
"VOP** and "frame" interchangeably. 

The VTP sender determines .from which discrete stream to 
send video data based on receiver feedback, and sends from that 



^http://www.faqs.oig^rfcs/irfc30l6.html 

^That is, there is only one video objea in every scene. 
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level exclusively until a decision is made to change. The QPs 
across all frames in a single level are all within a pre-deiined 
range and the frame pattern is the same in every level. In effect, 
VTP adapts to one of the pre-encoded quantization scales In the 
video source instead of computing the quantizers in real time 
during the streaming session. 

4 The Video Transport Protocol 

A typical video streaming server sends video data by divid- 
ing each frame into fixed size packets and adding a header con- 
taining, for example, a sequence number, the time the packet 
was sent, and the relative play out time of the associated frame. 
Upon receiving the necessary packets to re-assemble a frame, 
the receiver buffers the compressed frame for decoding. The de- 
compressed video data output from the decoder is then sent to 
the output device. If the decoder is given an incomplete frame 
due to packet loss during the transmission, it may decide to dis- 
card the frame. The mechanism used in the discarding decision 
is highly decoder-specific, but the resulting playback jitter is a 
universal effect. As predicted frames depend on key frames, dis- 
carding a key frame can severely reduce the overall frame rate. 

The primary design goal of VTP is to adapt the outgoing video 
stream so that, id times of netwoiic congestion, less video data 
is sent into die networic and consequently fewer packets are lost 
and fewo* frames are discarded. VTP rests on the underlying 
assumption diat the smooth and timely play out of consecutive 
' ' frames is central to a human observer's perception of video qual- 
ity. Although a decrease in the video bitrate noticeably produces 
images of coarser resolution, it is not nearly as detrimental to 
the perceived video quality as inconsistent, start-stop play out 
VTP capitalizes on this idea by adjusting both the video bitrate 
and its sending rate during the streaming session. In order to 
tailor the video bitrate, VTP requires the same video sequence 
' - to be pre-encoded at several dii^erent compression levels. By 
switching between levels during the stream, VTP makes a fun- 
damental trade-off by increasing the video compression in effort 
to preserve a consistent frame rate at the client 

In addition to maintaining video quality, the other important 
factor for setting adaptivity as the main goal in the design is 
inter-protocol fairness. Unregulated networic flows pose a risk 
to the stability and performance of the Internet in their tendency 
to overpower TCP connections that carry the large majority of 
traffic. While TCP halves its window in response oongesdon, 
unconstrained flows are under no restrictions as to die amount of 
data they can have in the network at any time. VTPs adaptivity 
attempts to alleviate this problem by imeracting fwly with any 
competing TCP flows. 

The principal features of this design, each described In the 
following subsections, can be sunmiarized as follows: 

1. Communication between sender and receiver is a "closed 
loop," i.e. the receiver sends acknowledgments to the 
sender at regular intervals. 

2. The bandwidth of the forward path is estimated and used 
by the sender to determine the sending rate. 

3. VTP is rate based. There is no congestion window or slow 
start phase. 



4.1 Sender and Receiver Interaction 

VTP follows a client/sever design where the client Initiates a 
session by requesting a video stream from the server. Once sev- 
eral initialization steps are completed, the sender and receiver 
communicate in a closed loop, with the sender using the ac- 
knowledgments to determine the bandwiddi and RTT estimates. 
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Hgure 3 shows the VTP video header and acknowledgment 
or "control packet" formats. The symmenic design fiKilitates 
both bandwidth and RTT computation. The TYPE field is used 
by die sender to explicitiy request a control packet finom the re- 
ceiver. For every k video packets sent, die sender will mark the 
TYPE field with an ack request, to which die receiver will re- 
spond with a control packet. The value of lb is a server option 
diat is configurable at run time by the user. The two dmestamp 
fields for sender and receiver respectively are used for RTT mea- 
surement and bandwidth computation. VTP estimates the band- 
widdi available to it on the path and then calibrates its sending 
rate to the estimate, as detailed below. 

When the receiver receives a data packet with the TYPE field 
indicating it should send a control packet, it performs two simple 
operations. First, it copies the header of the video packet and 
writes Its timestanq> imo the appropriate fields. Second, it writes 
the number of bytes received since die last control packet was 
sent into the SIZE field. The modified video packet header is 
then sent back to the sender as a control packet 

Upon receipt of the control packet, the sender extracts die 
value In the SIZE field and die receiver tiniestamp. The sender is 
able to compute die time delta between control packets at the re- 
ceiver by keeping the value of one previous receiver timestamp 
in memory and subtracting it from die timestamp in the most re- 
cently received packet. The value of die SIZE field divided by 
this time delta is die rate curremly being achieved by diis su-eam. 
This rate is also die "admissible" rate since it is die rate at which 
data is getting through the path botdeneck. In essence, the mea- 
sured rate is equal to the bandwidth available to the connection. 
Thus, it is input as a bandwiddi sample into the bandwiddi esti- 



mation algorithm described in the next section. 

The sender uses its own timestamps to handle the RTT com- 
putation. When the sender sends a video packet with the TYPE 
field marked for acknowledgment, it remembers the sequence 
number. If the sequence number on the returning control packet 
matches the stored value (recall the receiver simply copies the 
header into the control packet changing only its own timestamp 
and the SIZE field), the sender subtracts the sender timestamp in 
the control packet from the current time to get the RTT sample. 

If either a data packet that was marked for acknowledgment 
or a control packet is lost, the sender notices a discrepancy in 
the sequence immbers of the anrtving control packets. That is, 
the sequence numbers to not match those that the sender has 
recorded when sending out video packets with ack requests. In 
this case, the sender disregards the information in the control 
packets. Valid bandwidth or RTT samples are always taken from 
two consecutively arriving control packets. 

4.2 Bandwidth Estimation and 
Rate Adjustment 

Bandwidth estinution is an active area of research in its own 
right [1 , 6, 7, 16]. In this paper we provide only a brief summary 
following [7). Recall from the previous section that the achieved 
rate sample bi can be dbtained by dividing the amount of data 
in the last Jb packets by the inter-arrival time between the current 
and ft — 1 previous packets. As a concrete example, suppose 
k =^ 4 and four packets arrive at the receiver at times <i, ...,t4 
each with di , ... , cl4 bytes of data respectively. The sum ]C<«i ^« 
is sent to the sender (in the SIZE field of the control packet). 

The sender, knowing from the last control packet and tA 
from the current control packet, computes 



Exponentially averaging the samples using the formula 

Bi = (a)B,_. (ii^t^) (2) 

yields the bandwidth.estlmate Bi that is used by the sender to 
adjust the sending rate. The parameter a is a weighting factor 
that determines how much the two most recent samples should 
be weighed against the history of the bandwidth estinmte. In 
experimental uials. it was determined that VTP performs best 
when a is a constant close to 1. Packet loss is reflected by a 
reduction in the achieved rate thus bandwidth estimate. Since the 
bandwidth estimadon formula takes Into account losses due to 
both congestion and random enors, using an exponential average 
prevents a single packet drop due to a link error from causing a 
steep reduction in the estimate. 

Through the estimate of the connection bandwidth* the VTP 
sender gains considerable knowledge about the conditions of 
the path. The sender uses the estimate as input into an algo- 
rithm that determines how fast to send the data packets and 
which pre-encoded video to send. We describe the algorithm 




Figure 4: VTP finite state machine with states and transitions involved 
in a video quality level increase represented with dashed lines. 



in terms of a finite state machine (FSM), shown in Figure 4. As- 
suming three video encoding levels, the states QO, QU and Q2 
each correspond to one distinct video level from which VTP can 
stream. We use 3 levels throughout this example for simplic- 
ity, but n > 3 levels are possible in general. Each of the IR 
states, IRO, 1R1» and IR2, represent increase rate states, and DR 
repiesents the decrease rate state. In Figure 4, the states and tran- 
sitions involved In a quality level increase are highlighted with 
dashed lines. 

Starting in state Q0» a transition to IRO is Initiated by the re- 
cepdon of a bandwidth estimate that is equal to or greater than 
the current sending rate. Being in state QO only implies the VTP 
server is sending the lowest quality level, it says nothing about 
the sending rate. In state IRO. Uie server checks several condi- 
tions. First, it checks if the RTT timer has expired. If it has not, 
die server returns to QO without taking any action and awaits 
the next bandwidth estimate. If one RTT has passed, it remains 
in IRO and investigates further. It next determines whether the 
sending rate is large enough to support the rate of the next high- 
est level (level 1 in this case). If not, the server increases die 
sending rate by one packet size and returns to state QO. If, on 
' die other hand, the sending rate can accommodate the next qual- 
ity level, die server checks die value of a variable we call "the 
heuristic." I 

The heuristic Is meant to protect against over ambitiously in- 
creasing the video quality in response to instantaneous available 
bandwidUi on the link diat is short-lived and will not be able to 
sustain die higher bitrate su^am. If the heuristic is satisfied, die 
server increases the sending rate by one packet size and transi- 
tions to state Ql. If the heuristic is not met, die server increases 
die rate by one packet and returns to state QO. In normal oper- 
ation, the server will cycle between states QO and IRO continu- 
ally examining the RTT timer, the bandwiddi estimate, and the 
heuristic and adjusting the sending rate. When conditions per- 
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mil, the transition to Ql occurs. The process repeats itself for 
each of the quality levels. 

In the current implementation the heuristic Is an amount of 
lime, measured in units of RTT, to wail before switching to the 
next higher level of video quality. Ideally, the heuristic would 
also take into account the receiver buffer conditions to ensure a 
video quality increase would not cause buffer overflow. Since 
the receiver is regularly relaying timestamp information to the 
sender, it would be expedient to notify the sender of the amount 
of buffer space available in the control packet. The sender would 
then be able to make the determination to raise the video quality 
with assurance that both the network and the receiver can handle 
the data rate increase. [24] examines the factors that need to be 
taken into account in quality changing decisions in detail. 

In a rate and quality decrease, the transition to DR is initiated 
when the server receives a bandwidth estimate less than its cur* 
rent sending rate. In DR, the server checks the reference rate 
of each constituent quality to find the highest one Ifaat can fit 
witiiin the bandwidth estimate. The server sets its sending rate 
* to the bandwidth estimate and transitions to die state correspond* 
ing to the video quality that can be supported. Unlike the state 
transitions to increase quality levels, the decrease happens im- 
mediately, with no cycles or waits on the RTF timer. This con- 
servative behavior contributes greatly to the fairness properties 
of VTP discussed in Section 5.S. 

As the FSM suggests, the selection of the encoding bitiates is 
important VTP observes the rule that a particular video encod- 
ing levd must be transmitted at a rate greater than or equal to its 
tutrate and will not send slower than the rate of the lowest qual- 
ity encocfing. This could potentially saturate the network and 
exacerbate congestion if die lowest video bitrate is fiiequentiy 
higher than the available bandwidth. Additionally, if the step 
size between each reference rate is laige, more data buffering 
is required at the receiver. This follows from the feet that laige 
step sizes lead to die condition v/hm VTP is sending at a rate 
that is considerably higher than the video biurate for long periods 
of time. 

4«3 Rate Based Congestion Control 

The stability of the Internet depends on tiie window based 
AI^4D algorithm of TCP. Any protocol that does not observe 
• : the AIMD scheme requires justification to be considmd viable, 
especially for large-scale deployment VTP has no congestion 
window, does not perform slow start, and does not halve its send- 
ing rate on every packet loss. However, VTP uses resources in 
a minimal way and relinquishes them on the first indication of 
congestion. Justification for the plausibility of VTP is based 
mainly on the practical observation that the threat to Internet 
stability is not posed by fiows using congestion control schemes 
that are non-compliant to AIMD, but rather by flows under no 
• end-system conurol at all - flows that are completely impervious 
to network conditions. 

It has not been proven that Internet stability requires AIMD, 
but some form of end-to-end congestion control is necessary in 
order to prevem congestion collapse [13]. Even though VTP Is 
not founded on AIMD, It is still able to fiairiy share links with 
TCP competitors as e^denced by the experimental results of 



Section 5.5. Inter-protocol fairness of VTP notwitiistanding, any 
end-to^nd mechanism tiiat limits the flow of tiie real-time traffic 
in an environment where it competes witii TCP Is advantageous 
from die perspective of feimess. Furtiteimore, unlike TCP, VTP 
Is umed at preserving minimum variance in delivery rate at die 
. receiver. Streaming applications dial eschew TCP due to its os- 
cillatory steady state nature can benefit from tiie smooth dellveiy 
rate of VTP while during times of congestion tiieir data load on 
the network will be judiciously constrained. 

By default, VTP performs a type of congestion avoidance: it 
increases its rate by a small amount on every estimated RTF. 
Nomuilly, die rate increase is one packet size per RTF, but it 
can be tuned to compensate for large RTTs. The gradual rate in- 
crease seeks out available bandwidth and enables VTP to ^Vamp 
up" the video quality if network conditions remain acconunodat- 
ing. This behavior parallels the additive increase phase of AIMD 
so that rate increases in VTP and TCP are comparable. 

Throughout the duration of the coimection, VTP estimates the 
forward path bandwidtii. If the bandwidth estimate foils below 
the sending rate, VTP takes this as an Indication of netwoik con- 
gestion and reduces its rate. In summary, the protocol behaves 
conservatively by slightiy increasing the send rate every RTT 
and cutting the rate inunediately upon the arrival of ^Imd news." 

5 Implementation and Experiments 

We implemented VTP on the Linux platform and performed 
extensive evaluations using the Dununynet link emulator [26]. 
We developed a technique to smooth die bandwidth required by 
the outgoing video stream and compute the client buffer require- 
ment for specific pre-encoded video segments. The goals of our 
experiments are to assess inter-protocol fairness between VTP 
and TCP and evaluate the quality of the transmitted video played 
by the client. In this section we cover the software implementa- 
tion of VTP, oiu" approach to client buffering, and the results of 
our experimental evaluation. 

■ 

5.1 Software Architecture 

The VTP Implementation effort has strived to build a fiilly 
functioning video streaming platform. VTP software accepts 
standard Audio/Video Imerieaved (AVI) files as input F6r each 
video segment, VTP requires nuiltiple AVI files, each of a dif- 
ferent level of MPEG-4 compression. TWo mmn functional units 
comprise the VTP architecture. A transport laye- component 
called NetPeer provides an interface that returns an estimate of 
the bandwidth share of the connection. A middleware compo- 
nent called FileSystemPeer manages the source video data and 
determines the sending rate based on the estimate provided by 
NetPeer. 

For each set of AVI files, a binary file is created that contains 
the discrete encoded video along with synchronization mark- 
ers to guide the server in selecting the right frame when a level 
change needs to be made. Audio and video portions of the AVI 
files are de-multiplexed in the process of creating the binary file 
and only the video data is stored and oransmitted. Streaming au- 
dio and video in combination with VTP is a subject of future 
research. Upon receiving the client's re«|uest to start a stream. 
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Figure S: VTP Software Architecture. 



the FileSystemPeer opens the binary file and begins to send data 
at the lowest quality encoding. As the session progresses, the 
FileSystemPeer changes the video level in response to the Net- 
Peer feedback. 

The client and server communicate over two separate sockets: 
one UDF socket for data, one UDP socket for control informa- 
tion. Timestamps are gathered using the Berkeley Packet Filter 
utility (BPF/ running in a separate thread to minimize the influ- 
ence of the data processing on the RTT value. The EPF allows 
the user mode player and server processes to collect timestamps 
at the network interface level that exclude the operating system 
and protocol overhead time. The minimum measured RTT dur- 
ing die connection is used as the RTT value in the rate adjust- 
ment algorithm. Figure 5 presents a functional diagram of the 
VTP software architeaure. Bach of the two server components 
of VTP is independent and could potentially be used with other 
software nnodules. Similarly, the ctient NetPeer is intended to 
. function as a generic plug-in to any software player diat sup- 
ports modular input. In this implementation we used the xine 
video player [30] for Unix systems. 

A VTP software server may be hnplemented easily 1^ linking 
the FileSystemPeer and NetPeer modules and providing a main 
routine to foim an executable. The client side NetPeer includes 
buffering capability to accommodate network level buffering of 
video data. 

The FileSystemPeer API provides two major functions: 

is.eof a getPacket(qual, buffer # size); 
rate = setRate(rtt_val, bwL.est, &qual) ; 

The get Packet function fills the buffer field with a 
iheader-and sl'ze'bytes of video data from video quality qual. 
where ^al corresponds to one of the pie-encoded compres- 
sion levels in the binary file. A flag indicating if this is the 
last packet in the file is returned. The setRate function real- 
izes the algorithm in Section 4.2. The values for the parameters 
rtt.val and bw..est are provided by NetPeer (see NetPeer 
API below). The last parameter, qual, is passed by reference 
and is set by the setRate function and used as Input in the next 
call to getPacket. It should be noted that both getPacket 
and setRate maintain state between calls. 

Avdiable fiom http:/Avww-nig.ee.lbl.gov/. 



The NetPeer API provides three functions: 

bw.est = getBWE ( ) ; 
rtt_val = getRTT ( ) ; 
senclData(rate, buffer); 

The sender uses the getBWE to get the latest bandwidth 
estimate from its NetPeer. Internally. NetPeer performs non- 
blocking reads on the control socket to obtain the latent acknowl- 
edgment from the receiver. From the information in the ack,^ it 
computes bandwidth estimate which is the return value of the 
function. The sendmg rate can then be computed by calling the 
setRate function of the FileSystemPeer with the bandwidth 
estimate as the second parameter. GetRTT returns the latest 
value of the RTT estimate. The sendData function determines 
the amount of time to wait from die rate parameter, and then 
sends the buffer containing the header and video data. 

In addition to these exported functions, several other func- 
tions are provided to handle connection initiation, opening the 
source video files, and other Initialization tasks. These functions 
are straightforwaFd and omitted for teevity. The h parameter, 
the value of the heuristic variable (in units of RTT), and the port 
numbers that VTP uses are all user configurable. 

5.2 Transmission Schedules for 
Variable Bitrate Video 

In a constant bitrate (CBR) video source, the quantization 
parameters are contiimously adjusted to maintain the target bi- 
trate of the overall video stream. This Is beneficial for network 
transmission, but leads to varying video quality from frame to 
friune and can have an impleasant effect on the viewer's per- 
ception. MPHG-4 preserves consistent quality by increasing the 
bitrate at times of high motion or detail, producing a variable bi- 
trate (VBR) encoding. In some instances the bitrate can change 
dramatically during the course of a video clip. The amount of 
rate variability is codec-dependent. In this research we investi- 
gated tiu^ee MPEG-4 video codecs: DivX 4.2 [8], FFmpeg 0.4.6 
[12], and Microsoft MPEG-4 version 2 [19]. After several initial 
tests, die Microsoft codec was found to be inappropriate for VTP. 
This codec uses an algorithm that drops entire frames to achieve 
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Figure 6: Cumulative amooni of bytes recdved, C(t), when the entire 
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the desired compression level, which conflicts with the VTP as- 
sumption of a similar frame pattern across the set of encodings. 
Moreover, dropping frames for the purpose of compression has 
other highly undesirable effects: inconsistent frame rates, short- 
ening of the duration of the video, and an awkward, ''jump/' 
playback. The rest of this section assumes that the video source 
is compressed with a codec that does not skip frames to affect the 
level of compression, such is the case with DivX and FFmpeg. 

Since it would be ineffective to transmit video data at uneven, 
bursty rates, we developed a method for determining a transmis- 
sion schedule for VBR MPEG-4 video that leads to a piecewise- 
constant nominal sending rate. By taking advantage of a pri- 
ori knowledge of the bitrates of the stored video files, the peak 
bandwidth requirements and rate variability of the transmission 
can be significantly reduced. An appropriate sending mte can 
be incrementally computed by. averaging the video bitrate over 
discrete intervals. 

Let V{t) represent the cumulative amount of bytes consumed 
1^ the client firom the start of the streaming session to time t. In 
other words, if the video is encoded at a variable rate i;(r), 

V[i)^i2v(T) (3) 

As a starting poim for a constant rate transmission plan, let C(t) 
be the cumulative amount of bytes received at the client under a 
very simple CBR schedule: the constant rate equal to the size of 
the entire video segment (in bytes) over the duration. 

Figure 6 shows V{t) and C{t) for a 16 MB, 130 second sam- 
ple MPEG-4 video from a scene of the movie **TRON" encoded 
with DivX. The function 

U{t) = C(t) - V(t) (4) 

in the bottom plot in Figure 6 leads to several basic observations 
that are of interest. Intuitively, {/(to) < 0 for a particular to 
signifies that transmitting the video su-eam at simply the aver- 
age bitfate of the entire segment would lead to buffer undemin 



Figtue 7: Cumulative amount of bytes received based on a piecewise- 
consiant rate sdiedule with ten segments of equal duration. 



at dme to. The maximum positive value of U{i) coiresponds 
to the largest bu£fer occupancy in bytes under the same constant 
transmission rate. A straightforward generalization of this ap- 
proach involves shortening the Interval over which the average 
is taken, and connecting several CBR 'Yuns" to form a sequence 
of transmission rates. 

Figure 7 shows the same video source dissected into ten inter- 
vals of equal length. Within each interval, the rate is computed 
as the average of the video bitrate as before. In this figiure, C(t) 
stays closer to the V(t) curve, resulting in smaller peaks in U(t). 
The use of ten segments in particular was found in experimental 
trials to be a good compromise between the length and number 
of separate intervals. Under this plan, the sender adjusts its send- 
ing rate ten times during the course of the stream. Each sentting 
rate is exaaly the slope of the line segment for the corresponding 
interval. The bottom plot shows that the condition U{t) < 0 still 
holds at around t s 60, i =s 110, and t ^ 125, indicating buffer 
undenuns at these limes for this sendit^ plan. The next section 
addresses eliminating these undemins. and finding the minimum 
for buffer size required in the case of equal length, constant rate 
intervals. 

S3 Minimizing Client Buffer Requirements 

The technique in the previous section can be extended to opti- 
mize the transmission schedule to ensure nunimal use of recover 
memory for video data buifering. Our approach follows closely 
die PCRTT algorithm of [18). Given the constunpdon rate V{t) 
and a buffer »ze d al the client, a successfid transmission rate 
would delWer at least V'(^) but not more than V(t) + 6 bytes to 
the client at any time t, as illustrated in Figure 8. 

To find the minimiun b required for a particular video stream 
while protecting against buffer underruns, we consider again the 
function U(t) from equation (4). The maximum of U{t) is the 
amount of data that the piecewise-constant rate plan will send 
ahead of the consumption rate. The client must allocate a buffer 
of at least max U{i) bytes of data to avoid data loss due to over- 
flowing buffers. The minimum-value of U{t) corresponds to the 
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greatest difference between the consumption rate and die server 
sending rate, i.e.» the point where the sender falls roost behind 
the receiver. 

If I min U{t)\ bytes could be transmitted before the time at 
which minU(t) occiu-s. underruns would be prevented. Sup- 
pose that a time is chosen such that | min C/'(^)| bytes of data 
is sent in the interval [0, tj] in addition to the amount of data 
that needs to be sent under the constant rate plan. This way, 
we add | imnU{t)\/ii bytes/second to the rate computed by the 
piecewise-constant method to all the rates that lie in the interval 

In the worst case, time tj can fall precisely when U(t) is at its 
maxinmm. The client would then have to be able to buffer both 
the maximum of U(i) and | min U(t)\ at the instant Hence, 
if a &min byte buffer is allocated at die client, where 

6min = I maxC/(<)| + \mmU(t)\ (5) 

both underruns and overruns will be prevented. The time tj must 
be chosen before the time that min U{t) occurs, but ideally it 
should be chosen to be before the time of the first negative value 
of U{t). Figure 9 shows the cumulative amount of bytes re- 
ceived, C(£), under the adjusted CBR ten-segment transmission 
plan for the TRON video sources with tj = 5. In the figure, 
C(t) is always above the consumption rate V(t). The value of 
6min ,for this video segment is is a relatively modest 2.7 Mbytes, 
which is required around time t ^ 17. In general, choosing a 
suitable tj depends on the size of | min U(jt)\ and the time at 
which*muiC/(t) occurs. If | mm{/(^)| is large, some care must 
be taken so that tj is not too small. Tliat is, the additional t^s 
that need to be sent are spread out over time and not sent in a 
burst at the beguming of the streaRL 

With discrete encoding, each video is stored on disk as sev- 
eral disdnct streams differing in their level of compression. Each 
stream has an associated 6m!n and the transition points between 
segments occur at the same time points In each level. The 6min 
for the whole segment is simply chosen to be the maximum 6m{n 
of the constituent streams. Since the sending rates for all the 
video levels are pre-computed, this value of 6min is known be- 
fore any video data is sent. The next figures present the encoded 
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Figure 9: Cumulative bytes received. C(t), under the piecewise CBR 
transmission schedule computed for 'TRON.'* 



bitrates and resulting sending rate profiles for the two sets of 
video sources used throughout the experimental evaluation of 
VTP. 

The left plot of Figure 10 shows the encoded bitrate of three 
levels of quantization for the *TRON" video segment. The bi- 
trates of this DivX compressed MPEG-4 video exhibit a great 
deal of variability - from 500 Kbps to more than 4.5 Mbps in the 
case of the stream with QPs in the 2 to 10 range. The correspond- 
ing piecewise-constant sending rates computed with tj = 5 are 
shown in the right plot of Figure 1 0. The peak rate is reduced sig- 
nificantly to around 1 .6 Mbps. The left plot in Figure 1 1 presents 
die bitrate trace for a trailer for the movie "Adamls" compressed 
with the FFmpeg MPEG-4 video codec. The FFmpeg codec pro- 
duces, video diat is maritedly less variable rate than DIvX. An- 
other Imeresting point to note is files created with FFmpeg are 
smaller and use less bandwidth than DhrX for the same QPs. The 
right graph in Figure 1 1 shows the rate profile for the "Atlantis** 
sequence, with t^ again set at 5 seconds. 

An alternative and often used approach to "pre-sending** extra 
bytes for protecting against underruns is to delay the initial play- 
back at the client while it accunuilates some amount of buffered 
video data. However, the video player usually has its own re- 
quirements for buffering In addition to buffering done at the net- 
worie level. As can be seen in the ardiitecture diagram (Fig- 
ure 5), the player buffers data between the codec and the video 
output system to synchronize the display of consecutive frames. 
Buffers also need to absorii the small time scale rate changes due 
to variance in delay or *'jitter." Since VTP Is designed modulariy 
to operate with many video players, it does not place any restric- 
tions on the player with regard to play out start time. VTP offers 
6min as a guard against buffer overruns and underruns resulting 
from differences between the sending rate and consumption rate. 
The decisions of exactly how much buffer space to allocate and 
when to start play out are left to the player. 




s 



1.8 
1.6 
1.4 
1.2 
1 

0.8 
0.6 
0.4 
0.2 




OP range 2 - 10 
QPranoe^6.^o 
QP range 30- Si 



f 



20 40 60 80 

socomls 



100 



J L 



120 



140 



Figure 10: Source bltrates (left) and sending xate prpfile (right) produced for TRON.** 



soo p 



800 " 



a. 



700 
600 
500 



400 



QPrange2-10 — 
QPianoe16-20 — 
QPfangoao-ai 




700 



6Q0 




Figure 1 1: Source bitrates (left) and sending rate profile (right) produced for "Atlantis " 



5.4 Basic Protocol Behavior 

One of the main goals of VTP Is to fairly share network re- 
sources with other traffic. VTP attempts to achieve fEumess with 
TCP by reducing its sending rate whenever the bandwidth esti- 
mate indicates that the current sending rate cannot be supported. 
Depending on the difference between the estimate and the cur- 
rent sending rate, VTP can take several steps to back off, freeing 
network resources to ensure other Hows obtain an even share of 
the link. . 

• Figure 12 shows the behavior of VTP sending the ''Atlantis'* 
segment isolated on a 10 Mbps link with a 10 millisecond RTT 
This single connection an is unlikely scenario but it clearly illus- 
trates VTP progresdng through its rate change algorithm. The 
plot on the left displays the sending rote and computed band- 
width estimate, while the plot on the right displays which pre- 
encoded video stream VTP Is sending at the corresponding time. 

Each video packet contains 1 Kbyte of video data and the k 
parameter, which determines how often to send control packets, 
is set to 5. In experimental trials we found these settings strike 
a balance between minimizing protocol overhead resulting from 



acknowledgments and the need to keep packet sizes small to pro- 
mote even packet flow. The so-called heuristic variable, which 
tells VTP how long to wait before moving to the next higher 
video quality, is set to 2 RTTs. 

In the initial phase of the "Atlantis" session, the protocol starts 
sending the video segment at the rate of the transmission sched- 
ule for the lowest quality video. Since there is plenty of band- 
width available on the free 10 Mbps link. VTP raises its sending 
rate and the quality of the video stream. By about time 7 sec- 
onds, the highest quality video is being sent (with QPs in the 2 
to 10 range). For the remainder of the flow. VTP sends the high- 
est video quality at the rate prescribed in the transmission plan 
- with the exception of times 12 and 30 seconds. At these times 
VTP reduces the video quality one level for a brief time and then 
returns to sending the high quality video. 

The reason behind these quality *Valleys" can be understood 
by referring to the "Atlantis" transmission plan, the right dia- 
gram of Figure II. According to the plan, the rate requirement 
for the highest video quality suddenly increases by roughly 100 
Kbps at about 14 and again at 34 seconds. In the interest of 
fairness. VTP does not instantaneously increase its rate by such 
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large amounts. Instead, it switches to sending video that is one 
quality level lower, and continues to probe for available band- 
width by increasing the sending rate by 1 padcet per RTT. After 
1 second, the sending rate reaches the rate required for the high- 
est video level and the heuristic is satisfied. This allows VTP to 
switch back to the highest quali^ video. A threshold is applied 
to the sending rate so that if the difference between the sending 
rate and the reference rate is small, the VTP server can increase 
its rate without performing bandwidth exploration. This hap- 
pens, for example, at 21 seconds in Figure 12. This way, VTP 
conservatively favors fairness when the prescribed rate increase 
is large, but it does not irapidly change video streams on every 
minor rate adjustment in the send plan. The threshold is config- 
urable by the user at run time. In this experiment, the threshold 
was set to 1 Kbps. 



5.5 Fairness with TCP 

The remaining experiments were designed to quantitatively 
measure how much bandwidth TCP and VTP attain when com- 
peting directly with each other. We streamed both the '*TRON" 
and "Atlantis" video sources in various scenarios differing 
mainly in link capacity and number of competing TCP connec- 
tions. The experiments were performed using a relatively simple 
network topology in which two independent LANs were con- 
nected through a PC running FreeBSD acting as a gateway. The 
Dummynet utility and the Iperf program^ were used to vaiy the 
link capadiy and generate background TCP traffic respectively. 
In thisr environment all packets arrive in order, so any gap is s&> 
quence numbers can immediately be interpreted by the receiver 
as packet loss. 

Figure 1 3 presents the normalized throughput of VTP sending 
die "Atlantis'* segment on a 3 Mbps, 10 ms RTT link with various 
numbers of TCP'fiows. Each column of data points represenu a 
separate experiment where a single VTP flow and several TCP 
flows share the link. The x axis is labeled with total number of 
flows (e.g. the column labeled "16" is the result of one VTP and 
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Figure 13: Single VTP flow compeUng with TCP on a 5 Mbps link. 
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15 TCP flows). The normalized throughput is computed by sim- 
ply dividing the average bandwidth received by each flow by the 
fair share bandwidth value for each case. Perfect inter-protocol 
fairness would be exhibited by both VTP and TCP scoring a nor- 
malized throughput of 1. The vertical bars show the standard 
deviation of die TCP bandwidth values for cases where diere is 
more than 1 TCP coimection. 

In the case of 2 connections, TCP obtains much more band- 
width simply because VTP has no need to transmit faster than 
about 450 Kbps, the average rate of the sending plan for the 
highest video quality (see I^guce II). As the nuinber of con- 
nections Increases, VTP and TCP compete for the limited re- 
sources of tiie link. VTP shares the link relatively foirly except 
for the case of 32 coimections. In this case, the £air share value is 
3000/32 ss 93.75 Kbps, which is roughly tiu'ee quarters of die 
rate of the lowest video quality according to Figure 1 1. Since 
VTP does not send slower that the rate of the transmission plan 
fbr the lowest video quality (about 125 Kbps according to Figure 
11) it uses slighUy more than the fair share value of Uie band- 
width. It is important to note that diis unfairness is not an inher- 
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ent limitation of VTP» but a circumstance of the lelationship be- 
tween the link capacity and the video encoding. The case where 
VTP shares the link with 7 TCP connections results in near per- 
fect foimess. 

In Figure 14, VTP sends the 'TRON" video segment on a 5 
Mbps, 10 ms RTF link against background TCP traffic- The 
'TRON" send plan requires significantly higher bitrates than 
"Atlantis,** thus we set the link speed correspondingly higher. 
The "TRON" transmission plan also contains larger instanta- 
neous jumps in send rate - as much as 1 Mbps for the highest 
video quality (see Figure 10). Both of these differences are a re- 
sult of the dissimilar bitrate profiles produced by the DtvX and 
FFmp^ codecs, as evident in Figures 10 and 11. 

Figure 14 shows that VTP uses less than or equal to its fair 
share of bandwidth in all cases except that of 16 connections, 
where again the link limitation is reached. The figure veriiiQ^ the 
"Atlantis^* experiments: that VTP behaves fairly, in some cases 
generously leaving bandwidth unused, if its bandwidth share al- 
location is at least enough to stream the lowest quality of video. 

In summary, we have demonstrated that VTP uses network 
resources fairly when fiacing competition from the AIMD based 
congestion control of TCP. In lightly loaded networks, VTP uses 
only the bandwidth required to transmit at the rate of the highest 

' qusdity video stream, the remaining bandwidth can be claimed 
by other connections. In environments of moderate congestion, 
VTP fairly shares the link as long as its fair share is at least the 
rate of the lowest quality video. Additionally, VTP's fairness 

. properties are not codec specific, and it is able to maintain sta- 
ble sending rates when streaming source video with significantly 
different transmission plans. 

5.6 Video Quality 

An accurate, well-defined, and widely accepted standard for 
measuring the application-level perceived quality of network 
transmitted, compressed video does not exist in the literature at 
this time. In [17], the authors detail the problematic and subjec- 
tive nature of quality assessment and discuss the shortcomings of 



several existing approaches. [15] suggests gathering a group of 
viewers in a room with specialized equipment to subjectively as- 
sign grades to the video tiiey observe. Extracting a general quan- 
titative measure finom this type of assessment would be nearly 
impossible. The American National Standards Institute (ANSI) 
has produced a specification of parameters for quality degrada- 
tion (blurring, distortion, tiling, etc.) [3]. However, these mea- 
sures are focused entirely on quality degradation due to com- 
pression, not packet loss. Degradation due to loss is transient 
in nature and only affects part of the ftame, whereas degrada- 
tion due to compression invariably affects the whole frame. The 
ANSI quality parameters are insensitive to severely degraded or 
missing p^tm of frames, which are very noticeable to the human 
viewer. 

Another commonly used metric for attempting to objectively 
measure video quality is the Peak Signal to Noise Ratio (PSNR), 
which is the pixel-by-plxel difference between the original and 
degraded images in one of the chrominance or luntinance com- 
ponents. PSNR is defined in terms of the root mean squared 
error (RMSB) as PSNR = 20 log|o(255/RMSE). For an 8 bit 
image componem of a degraded n by m frame f from an origi- 
nal frame /. 
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Figure IS shows the evolution of Uie PSNR of die luminance 
component of tiie "Atiantiai" segment from firames 4(X) to 7(X), 
indicating die effect of increasing the quantization in terms of 
PSNR. The chart represents how PSNR can be a suitaUe incUca- 
tor of changes In quality due to vaiying levels of compression. 
Using PSNR as a measure of perceived quality of the transmitted 
and client-displayed video, however,' is wrought wdth difficulties. 
First, it is widely known tiiat PSNR values do not correspond 
well with the characteristics of the human visual system, mak- 
ing it a poor gauge of perceived quality. Second, codecs that 
skip frames to affect compression can easily yield video that has 
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Figure 16: Frame rate of received ''Atlantis'* stream using VTP and Nan-Adaptive Streaming. 
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Figure 17: Frame rate of received 'TRON** stream with VTP and Noa-Adaptive Streaming. 



a veiy high average PSNR per frame, but looks inferior in play- 
back when compared with video with a lower average PSNR 
where the frame rate is held constant Lastly, VTP draws video 
from the different pre-encoded streams as it progresses through 
its congestion control algorithm. Each received frame would 
have to be matched to its source stream to compute the correct 
PSNR values. This is also complicated by the fact thai it is nat- 
ural for two consecutive frames to have considerably different 
PSNR values, as evident from Figure 15. 

For these reasons, in the experimental evaluation of the video 
quality delivered by VTP, we concentrate on two parameters 
diat are easy to inierpreu the frame rate of the received video 
and the average values of the of the quantization parameters. We 
place a rather strict constraint on the player by configuring it to 
only display frames which are received completely intact. i.e., 
frames which have any errors due to packet loss aie discarded. 
This bolsters the importance of the p\sy out frame rate and mag- 
nifies the performance of VTP in terms of its key goal of provid- 
ing a stable frame rate through quantization scale adjustment 

Figure 16 contrasts die frame rate of the received "Atlantis" 
stream using VTP and non-adaptive streaming. By non-adaptive 



streaming, we mean the highest video rate is sent according to 
its transmission plan throughout the duration of the streaming 
session, regardless of network conditions. No bandwidth esti- 
mation or video quality changes are performed, and the send- 
ing rate changes only when dictated by the piecewise-constant 
transmission schedule developed for "Atlantis." The experimen- 
tal scenario is the sanne as in the previous section where VTP is 
competing with IS TCP flows on a -3 Mbps capacity link with 
a 10 millisecond RTT. The non-adaptive streaming flow is like- 
wise examined under the same conditions. The overall frame 
ratie of the encoded source video is 23.975 frames per second 
(fps) in both cases. At several instances, around times 7 and IS 
seconds, the non-adaptive frame rate drops below 15 fps, which 
is widely held to be the threshold of viewable video. With VTP, 
these severe decreases are avoided, and the frame rate is always 
in the range of 18 to 24 fps. 

Figure 17 depicts another representative example of the ad- 
vantage gained by VTP adaptivity. In this experiment, the con- 
ditions are those of the fourth case in Figure 14: 1 monitored 
flow (either VTP or non-adaptive streaming) sharing a 5 Mbps, 
10 ms RTT link with 1 1 competing TCP connections. As the 
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Figure 18: Average values of quantization parameters of the delivered 
"Adantis** stream. 



streaming session progresses, VTP discovers the fair share of 
available bandwidth and appropriately tunes to sending rate and 
video bitrate to avoid overflowing the router buffer. The result- 
ing frame rate of the VTP stream stabilizes with time, while the • 
frame rate of the non-adaptive stream increasingly oscillates to- 
ward the end of the segment, suffering from the effect of router 
packet drops. 

In Figure 18 we present the average values of the QPs of the 
"Atlantis" segment throughout the duration of the session. We 
show the 3 Mbps case from the experiment in the previous sec- 
tion together with the case of a 2 Mbps link. Hie plot verifies 
VTP adapts the outgoing video stream to fit the available net- 
work bamiwidth. When there is little contention for the link, 
e.g. 2 and 4 total connections, VTP chooses video primarily 
fipom the high quality, high bitrate stream (recall lower QP val- 
ues hnply less compression and higher quality). As the number 
of competmg TCP connections increases, the QP values consis- 
tently increase, indicating VTP lowering the qualiQf of the out- 
going video in response to congestion. This clearly illustrates 
the mechanism by which VTP attains adaplivity. VTP is also 
aware of the additional bandwidth afforded to it by the increase 
in link capacity from 2 to 3 Mbps. In the cases of 8, 16, and 32 
connections, VTP carefully chooses the highest quality outgoing 
stream that will fit its fair share of the available bandwidth. This 
leads to a QP reduction of between 3 and S, indicating higher 
quality video being sent when more bandwidth is available at 3 
Mbps. 

I 

% 

In this paper we designed, implemented and tested a new pro- 
tocol to stream compressed streaming video in real-time. A dis- 
tinct feature of VTP is the use of bandwidth estimation to adapt 
the sending rate and the ^deo encoding in response to changes 
in network conditions. We developed VTP In concert with open 
standards fox video compression and file formats, and built a 
plug-in for a widely used video player to serve as the VTP re- 
ceiver. We have made an effort to make VTP easily extensible. 



VTP was evaluated in a controlled network environment un- 
der a variety of link speeds and background U^c. Experimen- 
tal results show that VTP offers considemble gains over non- 
adaptive streaming in effective frame rate. To a large extent, 
VTP behaves fairly toward TCP when both protocols compete 
in a congested network. 

We found that VTP fairness toward TCP is vulnerable if the 
lowest video bitrate Is higher than the average link fair share 
available to VTP. A priori knowledge of the general link capac- 
ity and typical networic utilization can be exu-emely useful in 
the selection and configtumion of the video sources for VTP. 
We believe this information is usually not difficult to obtain for 
' administrators, and that a small amount of careful manual con- 
figuration is a reasonable price for the advant^es of VTP. 

7 Future Work 

For any stteaming system to be fully useful, audio and video 
must be multiplexed into the data stream and synchronized dur- 
ing play out. A near term goal is to include the capability to 
adaptively su-eam audio and video in combination under the VTP 
protocol. 

We will also further investigate the effect of changing the 
k parameter, the number of packets used to compute: a single 
bandwidth sample. We plan to implement an algorithm to dy- 
namically adjust k during streaming to improve VTP's efficiency 
and fairness with TCP. Another advants^e would be reducing the 
amount of manual user configuration required. 
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